Task - 1

In [22]:
!pip install scipy==1.8.1
!pip install clean-text
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: scipy==1.8.1 in /usr/local/lib/python3.8/dist-packages (1.8.1)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /usr/local/lib/python3.8/dist-packages (from scipy==1.8.1) (1.21.6)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting clean-text
  Downloading clean_text-0.6.0-py3-none-any.whl (11 kB)
Collecting emoji<2.0.0,>=1.0.0
  Downloading emoji-1.7.0.tar.gz (175 kB)
     |████████████████████████████████| 175 kB 5.1 MB/s 
Collecting ftfy<7.0,>=6.0
  Downloading ftfy-6.1.1-py3-none-any.whl (53 kB)
     |████████████████████████████████| 53 kB 1.5 MB/s 
Requirement already satisfied: wcwidth>=0.2.5 in /usr/local/lib/python3.8/dist-packages (from ftfy<7.0,>=6.0->clean-text) (0.2.5)
Building wheels for collected packages: emoji
  Building wheel for emoji (setup.py) ... done
  Created wheel for emoji: filename=emoji-1.7.0-py3-none-any.whl size=171046 sha256=0fb22982ca7df4cc5047f3e2e7f4822a28672f9869f1262c843effa4c474b88c
  Stored in directory: /root/.cache/pip/wheels/5e/8c/80/c3646df8201ba6f5070297fe3779a4b70265d0bfd961c15302
Successfully built emoji
Installing collected packages: ftfy, emoji, clean-text
Successfully installed clean-text-0.6.0 emoji-1.7.0 ftfy-6.1.1
In [2]:
from google.colab import files
file_data = files.upload()
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving Keyword_data - Keyword_data.csv to Keyword_data - Keyword_data.csv
In [3]:
import pandas as pd
import numpy as np
import networkx as nx
import re
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import os
import warnings
warnings.filterwarnings('ignore')
In [4]:
df_keyword = pd.read_csv('Keyword_data - Keyword_data.csv')
df_keyword.shape
Out[4]:
(66, 13)

Clean the Dataset by dropping null columns

In [5]:
df_keyword.dropna(how = "all", inplace=True)
df_keyword.reset_index(drop=True)
# df_keyword = df_keyword.drop(columns = {'index'})
df_keyword
Out[5]:
Title Keyword 1 Keyword 2 Keyword 3 Keyword 4 Keyword 5 Keyword 6 Keyword 7 Keyword 8 Keyword 9 Keyword 10 Keyword 11 Keyword 12
0 Feb/03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Meta-Analyses of Financial Performance and Equ... EQUITY ORGANIZATIONAL sociology PERFORMANCE META-analysis PSYCHOMETRICS ORGANIZATIONAL research FINANCIAL performance AGENCY theory ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior CORPORATE governance NaN
3 Home Country Environments, Corporate Diversifi... DIVERSIFICATION in industry BUSINESS planning PERFORMANCE standards EMPLOYEES -- Rating of CORPORATE culture STRATEGIC planning ORGANIZATIONAL effectiveness MANAGEMENT science MANAGEMENT research PRODUCT management NaN NaN
4 Safeguarding Investments in Asymmetric Interor... INTERORGANIZATIONAL relations INTERGROUP relations BUSINESS communication INVESTMENTS SUPPLY chains KNOWLEDGE management INTERORGANIZATIONAL networks CORPORATE governance GROUP decision making INTELLECTUAL capital NaN NaN
5 Managerialist and Human Capital Explanations f... EXECUTIVE compensation WAGES HUMAN capital LABOR economics PERSONNEL management MANAGEMENT science CONTINGENCY theory (Management) COMPENSATION management EXECUTIVE ability (Management) CORPORATE governance NaN NaN
6 Bidding Wars Over R&D-Intensive Firms: Knowled... KNOWLEDGE management INFORMATION resources management MANAGEMENT information systems BREAK-even analysis DATA mining MANAGEMENT science RESEARCH & development RESEARCH & development contracts CORPORATE governance DECISION making ORGANIZATIONAL behavior TRANSACTION costs
7 When “The Show Must Go On”: Surface Acting and... EMOTIONS (Psychology) INTERPERSONAL relations STRESS (Psychology) SOCIAL interaction SOCIAL psychology EMPLOYEES -- Attitudes CUSTOMER services CUSTOMER satisfaction JOB stress PEER review (Professional performance) NaN NaN
8 Relationships among Supervisors' and Subordina... SUPERVISORS JUSTICE CONFLICT management MEDIATION EMPLOYEES INDUSTRIAL relations ORGANIZATIONAL behavior UNITED States -- National Guard ORGANIZATIONAL effectiveness DECISION making RESOURCE allocation NaN
9 Punctuated Equilibrium and Linear Progression:... INDUSTRIAL relations MANAGEMENT science DECISION theory ORGANIZATIONAL sociology PUNCTUATED equilibrium (Evolution) ORGANIZATIONAL change ORGANIZATIONAL behavior ORGANIZATIONAL structure BUSINESS models ORGANIZATIONAL research NaN NaN
11 Apr/03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 The Relationship between Overconfidence and th... DECISION making EXECUTIVES INDUSTRIAL management NEW products HIGH technology industries NaN NaN NaN NaN NaN NaN NaN
14 Governance Through Ownership: Centuries of Pra... CORPORATE governance INDUSTRIAL management STOCKHOLDERS wealth INSTITUTIONAL investors WAGES NEW products ORGANIZATIONAL structure ORGANIZATIONAL behavior DECENTRALIZATION in management ORGANIZATIONAL effectiveness NaN NaN
15 Strategic Satisficing? A Behavioral-Agency The... EXECUTIVES STOCKHOLDERS wealth STOCK repurchasing CORPORATIONS -- Finance INCENTIVES in industry CORPORATE governance STRATEGIC planning EXECUTIVE ability (Management) AGENCY theory ORGANIZATIONAL behavior ORGANIZATIONAL effectiveness NaN
16 Exploring the Agency Consequences of Ownership... FAMILY-owned business enterprises DEBT DIRECTORS of corporations AGENCY theory ORGANIZATIONAL behavior ORGANIZATIONAL structure EMPLOYEE ownership CORPORATE governance DECISION making BOARDS of directors INDUSTRIAL relations NaN
17 Institutional Ownership Differences and Intern... INSTITUTIONAL investors DIVERSIFICATION in industry BUSINESS planning GLOBALIZATION BOARDS of directors INTERNATIONAL business enterprises FOREIGN investments PENSION trusts HIGH technology STRATEGIC planning TECHNOLOGICAL innovations INNOVATION adoption
18 Ownership Structures and R&D Investments of U.... RESEARCH & development INVESTMENTS PROPERTY INCENTIVES in industry AGENCY theory ORGANIZATIONAL sociology ORGANIZATIONAL structure STEWARDS NaN NaN NaN NaN
19 The Determinants of Executive Compensation in ... FAMILY-owned business enterprises CHIEF executive officers EXECUTIVE compensation BUSINESS enterprises RISK MUNICIPAL corporations CORPORATE governance RESEARCH & development ORGANIZATIONAL behavior ORGANIZATIONAL structure NaN NaN
20 Ownership Structure, Expropriation, and Perfor... PROPERTY PERFORMANCE STOCKHOLDERS PROFIT MINORITY stockholders EMINENT domain ORGANIZATIONAL effectiveness ORGANIZATIONAL structure CORPORATE governance NaN NaN NaN
21 CEO Stock Options: The Silent Dimension of Own... STOCK options STOCKS (Finance) CHIEF executive officers STOCK ownership EXECUTIVE compensation EMPLOYEE stock options ORGANIZATIONAL structure ORGANIZATIONAL effectiveness DECISION making RISK management in business NaN NaN
23 Jun/03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25 Assessing Creativity in Hollywood Pitch Meetin... MANAGEMENT science DECISION making SCREENWRITERS CREATIVE ability CREATIVE ability in business SOCIAL judgment theory (Communication) MOTION picture authorship SELF-perception ORGANIZATIONAL behavior QUALITY of products NaN NaN
26 Reactions to Perceived Inequity in U.S. and Du... INTERORGANIZATIONAL relations INDUSTRIAL organization ORGANIZATIONAL behavior ORGANIZATIONAL effectiveness INTERGROUP relations ORGANIZATIONAL structure BUSINESS networks SUPPLIERS STRATEGIC alliances (Business) NaN NaN NaN
27 The Impact of Community Violence and an Organi... AGGRESSION (Psychology) VIOLENCE SOCIAL psychology ORGANIZATIONAL justice WORK environment INDUSTRIAL relations MANAGEMENT science VIOLENCE in the workplace ANGER in the workplace EMPLOYEES -- Attitudes PROBLEM employees WORK attitudes
28 Explaining New CEO Origin: Firm Versus Industr... CHIEF executive officers PERSONNEL changes SUCCESSION planning EXECUTIVE succession MANAGEMENT science EXECUTIVES -- Recruiting STRATEGIC planning MANAGEMENT research EXECUTIVE ability (Management) JOB qualifications ORGANIZATIONAL change NaN
29 Do High Job Demands Increase Intrinsic Motivat... MENTAL fatigue JOB stress INDUSTRIAL psychology BURNOUT (Psychology) SOCIAL networks PERSONNEL management MANAGEMENT science MOTIVATION (Psychology) INTRINSIC motivation JOB qualifications ORGANIZATIONAL behavior ORGANIZATIONAL effectiveness
30 Organizational Hiring Patterns, Interfirm Netw... PERSONNEL management PERSONNEL changes MANAGEMENT science INTERORGANIZATIONAL relations CONTAGION (Social psychology) TEAMS in the workplace EXECUTIVES -- Recruiting EMPLOYEE recruitment ORGANIZATIONAL sociology BUSINESS networks INTERORGANIZATIONAL networks NaN
31 The Effects of Centrifugal and Centripetal For... PRODUCT management NEW products PROBLEM solving QUALITY of products DECENTRALIZATION in management MARKETING management MANAGEMENT science PRODUCT design PRODUCT lines PRODUCT information management ORGANIZATIONAL behavior NaN
32 A Social Capital Model of High-Growth Ventures SOCIAL capital (Sociology) INFRASTRUCTURE (Economics) VENTURE capital INVESTMENTS GOING public (Securities) COMPETITIVE advantage ENTREPRENEURSHIP CAPITAL market RESOURCE management ORGANIZATIONAL effectiveness NaN NaN
34 Aug/03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
36 Transforming Work-Family Conflict into Commitm... ORGANIZATIONAL behavior MULTILEVEL marketing ORGANIZATIONAL commitment MARKETING management QUALITY of work life JOB satisfaction AMBIVALENCE ORGANIZATIONAL structure ORGANIZATIONAL effectiveness ORGANIZATIONAL sociology NaN NaN
37 Advocacy, Performance, and Threshold Influence... NEW products PERFORMANCE evaluation COMMERCIAL products PRODUCT management MARKETING DECISION making MARKETING -- Decision making RESEARCH & development STRATEGIC planning PRODUCT design NaN NaN
38 Managing from the Boundary: The Effective Lead... LEADERSHIP TEAMS in the workplace STRATEGIC planning SELF-management (Psychology) MANAGEMENT -- Employee participation CRITICAL incident technique TASK analysis MANAGEMENT science EXECUTIVE ability (Management) DECISION making NaN NaN
39 Team Member Functional Background and Involvem... TEAMS in the workplace DECISION making CRITICAL thinking WORKFLOW MANAGEMENT DECENTRALIZATION in management MANAGEMENT science ORGANIZATIONAL behavior DELEGATION of authority GROUP decision making STRATEGIC business units NaN
40 Happy Together? How Using Nonstandard Workers ... LABOR supply LABOR organizing CONDUCT of life ORGANIZATIONAL behavior EMPLOYEE loyalty ORGANIZATIONAL commitment INDUSTRIAL relations ORGANIZATIONAL structure EMPLOYEES -- Attitudes PERSONNEL management NaN NaN
41 Interpersonal Aggression in Work Groups: Socia... EMPLOYEES -- Attitudes AGGRESSION (Psychology) TEAMS in the workplace SOCIAL influence INDIVIDUAL differences INTERPERSONAL relations SOCIAL context ORGANIZATIONAL behavior ORGANIZATIONAL structure WORK environment NaN NaN
42 Share Price Reactions to Work-Family Initiativ... WORK & family PERSONNEL management STOCKHOLDERS WOMEN employees STOCKS (Finance) -- Prices MANAGEMENT science HUMAN resource accounting WOMEN -- Employment ORGANIZATIONAL behavior QUALITY of work life NaN NaN
43 The Role of Human Capital in Postacquisition C... HUMAN capital CHIEF executive officers CAPITAL investments LABOR economics CONSOLIDATION & merger of corporations EXECUTIVES -- Dismissal of ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior LABOR turnover EXECUTIVE succession NaN NaN
45 Oct/03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
47 How Much Should I Give and How Often? The Effe... SOCIAL status GENEROSITY BEHAVIORAL research LABOR productivity SOCIAL exchange INTERPERSONAL relations SOCIAL factors EMPLOYEES -- Attitudes -- Research PERSONNEL management ORGANIZATIONAL behavior NaN NaN
48 Self-Concordance at Work: Toward Understanding... LEADERSHIP EXECUTIVE ability (Management) EMPLOYEE motivation MOTIVATION (Psychology) INDUSTRIAL psychology MANAGEMENT science JOB satisfaction CHARISMATIC authority SELF-congruence MANAGEMENT styles NaN NaN
49 Cooperation, Competition, and Team Performance... EMPLOYEE motivation JOB performance TEAMS in the workplace INDUSTRIAL management PERSONNEL management ORGANIZATIONAL sociology INCENTIVES in industry INDUSTRIAL psychology GOAL setting in personnel management REWARD (Psychology) NaN NaN
50 The Impact Of Expectations On Newcomer Perform... TEAMS in the workplace ORGANIZATIONAL sociology EMPLOYEE motivation LEADERSHIP INTERPERSONAL relations INDUSTRIAL management PYGMALION (Greek mythology) GALATEA, sea nymph (Greek deity) SOCIAL exchange OCCUPATIONAL roles NaN NaN
51 THe Effects of Discontinuous Change on Latent ... ORGANIZATIONAL change EMPLOYEE rules HUMAN error RISK INDUSTRIAL management PERSONNEL management ORGANIZATIONAL behavior INDUSTRIAL psychology ORGANIZATIONAL research ERROR rates NaN NaN
52 Employee Creativity in Taiwan: An Application ... CREATIVE ability TAIWANESE EMPLOYEES PERSONNEL management EMPLOYEE motivation CREATIVE ability in business INNOVATION management CROSS-cultural differences NaN NaN NaN NaN
53 Media Legitimation Effects in the Market for I... GOING public (Securities) CORPORATE image STOCKHOLDERS -- Attitudes CAPITALISTS & financiers MASS media CORPORATIONS -- Investor relations MATHEMATICAL statistics CORPORATIONS -- Public relations PUBLIC companies TURNOVER (Business) NaN NaN
54 Giving Money to Get Money: How CEO Stock Optio... STOCK options GOING public (Securities) INCENTIVES in industry OPTIONS (Finance) CORPORATIONS -- Valuation CORPORATIONS -- Finance EXECUTIVE compensation CAPITALISTS & financiers BUSINESS enterprises -- Valuation DECISION making NaN NaN
56 Dec/03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58 A Behavioral Theory of R&D Expenditures and In... ORGANIZATIONAL behavior CORPORATIONS -- Finance RESEARCH & development INDUSTRIAL management INNOVATIONS in business INNOVATION management BUSINESS planning SHIPBUILDING industry TECHNOLOGICAL innovations -- Economic aspects SUCCESS in business COMPETITIVE advantage ORGANIZATIONAL change
59 Transformational Leadership, Conservation, and... LEADERSHIP ORGANIZATIONAL behavior CREATIVE ability in business EMPLOYEE motivation ORGANIZATIONAL change WORK environment -- Psychological aspects MANAGEMENT EXECUTIVE ability (Management) INTRINSIC motivation INDUSTRIAL relations INDIVIDUAL differences NaN
60 Informational Dissimilarity and Organizational... ORGANIZATIONAL behavior TEAMS in the workplace INDUSTRIAL psychology ORGANIZATIONAL effectiveness ORGANIZATIONAL goals ORGANIZATIONAL sociology SOCIAL psychology MANAGEMENT ORGANIZATIONAL change DIVISION of labor INDUSTRIAL organization WORK environment
61 Subsidiary Staffing in Multinational Enterpris... INTERNATIONAL business enterprises -- Management FOREIGN subsidiaries -- Management EMPLOYEE selection EXECUTIVES -- Recruiting ORGANIZATIONAL sociology ORGANIZATIONAL behavior AGENCY theory RESOURCE-based theory of the firm PERSONNEL management EMPLOYMENT in foreign countries SUBSIDIARY corporations -- Management HOST countries (Business)
62 Strategic Human Resource Practices, Top Manage... PERSONNEL management COMPETITIVE advantage BUSINESS networks INDUSTRIAL management STRATEGIC planning SOCIAL networks RESOURCE management RESOURCE-based theory of the firm HUMAN capital -- Management INTELLECTUAL capital DECISION making INDUSTRIAL efficiency
63 Compensation Policy and Organizational Perform... COMPENSATION management ORGANIZATIONAL behavior PERSONNEL management HOSPITALS -- Administration MANAGEMENT FINANCIAL performance WAGE payment systems RESOURCE management ORGANIZATIONAL effectiveness INDUSTRIAL efficiency FINANCIAL management INDUSTRIAL management
64 Functional Background Identity, Diversity, and... CROSS-functional teams TEAMS in the workplace GROUP identity ORGANIZATIONAL behavior MANAGEMENT PERFORMANCE PERSONNEL management COMPETITIVE advantage ORGANIZATIONAL effectiveness GROUP decision making ORGANIZATIONAL structure ORGANIZATIONAL sociology
65 A Customer Interaction Approach to Strategy an... SERVICE industries -- Management CUSTOMER relations INDUSTRIAL management PRODUCTION management STRATEGIC planning CUSTOMER services LABOR process ORGANIZATIONAL behavior DECISION making CUSTOMER satisfaction CUSTOMER orientation MARKETING strategy
In [6]:
keywords_df = df_keyword.iloc[:,1:]
keywords_df
Out[6]:
Keyword 1 Keyword 2 Keyword 3 Keyword 4 Keyword 5 Keyword 6 Keyword 7 Keyword 8 Keyword 9 Keyword 10 Keyword 11 Keyword 12
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 EQUITY ORGANIZATIONAL sociology PERFORMANCE META-analysis PSYCHOMETRICS ORGANIZATIONAL research FINANCIAL performance AGENCY theory ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior CORPORATE governance NaN
3 DIVERSIFICATION in industry BUSINESS planning PERFORMANCE standards EMPLOYEES -- Rating of CORPORATE culture STRATEGIC planning ORGANIZATIONAL effectiveness MANAGEMENT science MANAGEMENT research PRODUCT management NaN NaN
4 INTERORGANIZATIONAL relations INTERGROUP relations BUSINESS communication INVESTMENTS SUPPLY chains KNOWLEDGE management INTERORGANIZATIONAL networks CORPORATE governance GROUP decision making INTELLECTUAL capital NaN NaN
5 EXECUTIVE compensation WAGES HUMAN capital LABOR economics PERSONNEL management MANAGEMENT science CONTINGENCY theory (Management) COMPENSATION management EXECUTIVE ability (Management) CORPORATE governance NaN NaN
6 KNOWLEDGE management INFORMATION resources management MANAGEMENT information systems BREAK-even analysis DATA mining MANAGEMENT science RESEARCH & development RESEARCH & development contracts CORPORATE governance DECISION making ORGANIZATIONAL behavior TRANSACTION costs
7 EMOTIONS (Psychology) INTERPERSONAL relations STRESS (Psychology) SOCIAL interaction SOCIAL psychology EMPLOYEES -- Attitudes CUSTOMER services CUSTOMER satisfaction JOB stress PEER review (Professional performance) NaN NaN
8 SUPERVISORS JUSTICE CONFLICT management MEDIATION EMPLOYEES INDUSTRIAL relations ORGANIZATIONAL behavior UNITED States -- National Guard ORGANIZATIONAL effectiveness DECISION making RESOURCE allocation NaN
9 INDUSTRIAL relations MANAGEMENT science DECISION theory ORGANIZATIONAL sociology PUNCTUATED equilibrium (Evolution) ORGANIZATIONAL change ORGANIZATIONAL behavior ORGANIZATIONAL structure BUSINESS models ORGANIZATIONAL research NaN NaN
11 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 DECISION making EXECUTIVES INDUSTRIAL management NEW products HIGH technology industries NaN NaN NaN NaN NaN NaN NaN
14 CORPORATE governance INDUSTRIAL management STOCKHOLDERS wealth INSTITUTIONAL investors WAGES NEW products ORGANIZATIONAL structure ORGANIZATIONAL behavior DECENTRALIZATION in management ORGANIZATIONAL effectiveness NaN NaN
15 EXECUTIVES STOCKHOLDERS wealth STOCK repurchasing CORPORATIONS -- Finance INCENTIVES in industry CORPORATE governance STRATEGIC planning EXECUTIVE ability (Management) AGENCY theory ORGANIZATIONAL behavior ORGANIZATIONAL effectiveness NaN
16 FAMILY-owned business enterprises DEBT DIRECTORS of corporations AGENCY theory ORGANIZATIONAL behavior ORGANIZATIONAL structure EMPLOYEE ownership CORPORATE governance DECISION making BOARDS of directors INDUSTRIAL relations NaN
17 INSTITUTIONAL investors DIVERSIFICATION in industry BUSINESS planning GLOBALIZATION BOARDS of directors INTERNATIONAL business enterprises FOREIGN investments PENSION trusts HIGH technology STRATEGIC planning TECHNOLOGICAL innovations INNOVATION adoption
18 RESEARCH & development INVESTMENTS PROPERTY INCENTIVES in industry AGENCY theory ORGANIZATIONAL sociology ORGANIZATIONAL structure STEWARDS NaN NaN NaN NaN
19 FAMILY-owned business enterprises CHIEF executive officers EXECUTIVE compensation BUSINESS enterprises RISK MUNICIPAL corporations CORPORATE governance RESEARCH & development ORGANIZATIONAL behavior ORGANIZATIONAL structure NaN NaN
20 PROPERTY PERFORMANCE STOCKHOLDERS PROFIT MINORITY stockholders EMINENT domain ORGANIZATIONAL effectiveness ORGANIZATIONAL structure CORPORATE governance NaN NaN NaN
21 STOCK options STOCKS (Finance) CHIEF executive officers STOCK ownership EXECUTIVE compensation EMPLOYEE stock options ORGANIZATIONAL structure ORGANIZATIONAL effectiveness DECISION making RISK management in business NaN NaN
23 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25 MANAGEMENT science DECISION making SCREENWRITERS CREATIVE ability CREATIVE ability in business SOCIAL judgment theory (Communication) MOTION picture authorship SELF-perception ORGANIZATIONAL behavior QUALITY of products NaN NaN
26 INTERORGANIZATIONAL relations INDUSTRIAL organization ORGANIZATIONAL behavior ORGANIZATIONAL effectiveness INTERGROUP relations ORGANIZATIONAL structure BUSINESS networks SUPPLIERS STRATEGIC alliances (Business) NaN NaN NaN
27 AGGRESSION (Psychology) VIOLENCE SOCIAL psychology ORGANIZATIONAL justice WORK environment INDUSTRIAL relations MANAGEMENT science VIOLENCE in the workplace ANGER in the workplace EMPLOYEES -- Attitudes PROBLEM employees WORK attitudes
28 CHIEF executive officers PERSONNEL changes SUCCESSION planning EXECUTIVE succession MANAGEMENT science EXECUTIVES -- Recruiting STRATEGIC planning MANAGEMENT research EXECUTIVE ability (Management) JOB qualifications ORGANIZATIONAL change NaN
29 MENTAL fatigue JOB stress INDUSTRIAL psychology BURNOUT (Psychology) SOCIAL networks PERSONNEL management MANAGEMENT science MOTIVATION (Psychology) INTRINSIC motivation JOB qualifications ORGANIZATIONAL behavior ORGANIZATIONAL effectiveness
30 PERSONNEL management PERSONNEL changes MANAGEMENT science INTERORGANIZATIONAL relations CONTAGION (Social psychology) TEAMS in the workplace EXECUTIVES -- Recruiting EMPLOYEE recruitment ORGANIZATIONAL sociology BUSINESS networks INTERORGANIZATIONAL networks NaN
31 PRODUCT management NEW products PROBLEM solving QUALITY of products DECENTRALIZATION in management MARKETING management MANAGEMENT science PRODUCT design PRODUCT lines PRODUCT information management ORGANIZATIONAL behavior NaN
32 SOCIAL capital (Sociology) INFRASTRUCTURE (Economics) VENTURE capital INVESTMENTS GOING public (Securities) COMPETITIVE advantage ENTREPRENEURSHIP CAPITAL market RESOURCE management ORGANIZATIONAL effectiveness NaN NaN
34 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
36 ORGANIZATIONAL behavior MULTILEVEL marketing ORGANIZATIONAL commitment MARKETING management QUALITY of work life JOB satisfaction AMBIVALENCE ORGANIZATIONAL structure ORGANIZATIONAL effectiveness ORGANIZATIONAL sociology NaN NaN
37 NEW products PERFORMANCE evaluation COMMERCIAL products PRODUCT management MARKETING DECISION making MARKETING -- Decision making RESEARCH & development STRATEGIC planning PRODUCT design NaN NaN
38 LEADERSHIP TEAMS in the workplace STRATEGIC planning SELF-management (Psychology) MANAGEMENT -- Employee participation CRITICAL incident technique TASK analysis MANAGEMENT science EXECUTIVE ability (Management) DECISION making NaN NaN
39 TEAMS in the workplace DECISION making CRITICAL thinking WORKFLOW MANAGEMENT DECENTRALIZATION in management MANAGEMENT science ORGANIZATIONAL behavior DELEGATION of authority GROUP decision making STRATEGIC business units NaN
40 LABOR supply LABOR organizing CONDUCT of life ORGANIZATIONAL behavior EMPLOYEE loyalty ORGANIZATIONAL commitment INDUSTRIAL relations ORGANIZATIONAL structure EMPLOYEES -- Attitudes PERSONNEL management NaN NaN
41 EMPLOYEES -- Attitudes AGGRESSION (Psychology) TEAMS in the workplace SOCIAL influence INDIVIDUAL differences INTERPERSONAL relations SOCIAL context ORGANIZATIONAL behavior ORGANIZATIONAL structure WORK environment NaN NaN
42 WORK & family PERSONNEL management STOCKHOLDERS WOMEN employees STOCKS (Finance) -- Prices MANAGEMENT science HUMAN resource accounting WOMEN -- Employment ORGANIZATIONAL behavior QUALITY of work life NaN NaN
43 HUMAN capital CHIEF executive officers CAPITAL investments LABOR economics CONSOLIDATION & merger of corporations EXECUTIVES -- Dismissal of ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior LABOR turnover EXECUTIVE succession NaN NaN
45 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
47 SOCIAL status GENEROSITY BEHAVIORAL research LABOR productivity SOCIAL exchange INTERPERSONAL relations SOCIAL factors EMPLOYEES -- Attitudes -- Research PERSONNEL management ORGANIZATIONAL behavior NaN NaN
48 LEADERSHIP EXECUTIVE ability (Management) EMPLOYEE motivation MOTIVATION (Psychology) INDUSTRIAL psychology MANAGEMENT science JOB satisfaction CHARISMATIC authority SELF-congruence MANAGEMENT styles NaN NaN
49 EMPLOYEE motivation JOB performance TEAMS in the workplace INDUSTRIAL management PERSONNEL management ORGANIZATIONAL sociology INCENTIVES in industry INDUSTRIAL psychology GOAL setting in personnel management REWARD (Psychology) NaN NaN
50 TEAMS in the workplace ORGANIZATIONAL sociology EMPLOYEE motivation LEADERSHIP INTERPERSONAL relations INDUSTRIAL management PYGMALION (Greek mythology) GALATEA, sea nymph (Greek deity) SOCIAL exchange OCCUPATIONAL roles NaN NaN
51 ORGANIZATIONAL change EMPLOYEE rules HUMAN error RISK INDUSTRIAL management PERSONNEL management ORGANIZATIONAL behavior INDUSTRIAL psychology ORGANIZATIONAL research ERROR rates NaN NaN
52 CREATIVE ability TAIWANESE EMPLOYEES PERSONNEL management EMPLOYEE motivation CREATIVE ability in business INNOVATION management CROSS-cultural differences NaN NaN NaN NaN
53 GOING public (Securities) CORPORATE image STOCKHOLDERS -- Attitudes CAPITALISTS & financiers MASS media CORPORATIONS -- Investor relations MATHEMATICAL statistics CORPORATIONS -- Public relations PUBLIC companies TURNOVER (Business) NaN NaN
54 STOCK options GOING public (Securities) INCENTIVES in industry OPTIONS (Finance) CORPORATIONS -- Valuation CORPORATIONS -- Finance EXECUTIVE compensation CAPITALISTS & financiers BUSINESS enterprises -- Valuation DECISION making NaN NaN
56 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58 ORGANIZATIONAL behavior CORPORATIONS -- Finance RESEARCH & development INDUSTRIAL management INNOVATIONS in business INNOVATION management BUSINESS planning SHIPBUILDING industry TECHNOLOGICAL innovations -- Economic aspects SUCCESS in business COMPETITIVE advantage ORGANIZATIONAL change
59 LEADERSHIP ORGANIZATIONAL behavior CREATIVE ability in business EMPLOYEE motivation ORGANIZATIONAL change WORK environment -- Psychological aspects MANAGEMENT EXECUTIVE ability (Management) INTRINSIC motivation INDUSTRIAL relations INDIVIDUAL differences NaN
60 ORGANIZATIONAL behavior TEAMS in the workplace INDUSTRIAL psychology ORGANIZATIONAL effectiveness ORGANIZATIONAL goals ORGANIZATIONAL sociology SOCIAL psychology MANAGEMENT ORGANIZATIONAL change DIVISION of labor INDUSTRIAL organization WORK environment
61 INTERNATIONAL business enterprises -- Management FOREIGN subsidiaries -- Management EMPLOYEE selection EXECUTIVES -- Recruiting ORGANIZATIONAL sociology ORGANIZATIONAL behavior AGENCY theory RESOURCE-based theory of the firm PERSONNEL management EMPLOYMENT in foreign countries SUBSIDIARY corporations -- Management HOST countries (Business)
62 PERSONNEL management COMPETITIVE advantage BUSINESS networks INDUSTRIAL management STRATEGIC planning SOCIAL networks RESOURCE management RESOURCE-based theory of the firm HUMAN capital -- Management INTELLECTUAL capital DECISION making INDUSTRIAL efficiency
63 COMPENSATION management ORGANIZATIONAL behavior PERSONNEL management HOSPITALS -- Administration MANAGEMENT FINANCIAL performance WAGE payment systems RESOURCE management ORGANIZATIONAL effectiveness INDUSTRIAL efficiency FINANCIAL management INDUSTRIAL management
64 CROSS-functional teams TEAMS in the workplace GROUP identity ORGANIZATIONAL behavior MANAGEMENT PERFORMANCE PERSONNEL management COMPETITIVE advantage ORGANIZATIONAL effectiveness GROUP decision making ORGANIZATIONAL structure ORGANIZATIONAL sociology
65 SERVICE industries -- Management CUSTOMER relations INDUSTRIAL management PRODUCTION management STRATEGIC planning CUSTOMER services LABOR process ORGANIZATIONAL behavior DECISION making CUSTOMER satisfaction CUSTOMER orientation MARKETING strategy

Find the Unique Keywords from the dataframe and create a List

In [7]:
list_kw = []
keywords_df.apply(lambda x:list_kw.append(x.values), axis = 0)
list_kw = list(set(np.concatenate(list_kw).flat))
len(list_kw)
unique_val = [element for element in list_kw if str(element) != "nan"]
print("Output list is:", unique_val)
Output list is: ['RESEARCH & development', 'CONDUCT of life', 'HUMAN capital -- Management', 'STEWARDS', 'AGGRESSION (Psychology)', 'VIOLENCE in the workplace', 'PROBLEM solving', 'SOCIAL exchange', 'BOARDS of directors', 'CROSS-functional teams', 'RESOURCE allocation', 'SUPERVISORS', 'RISK', 'HIGH technology industries', 'INDUSTRIAL management', 'CAPITALISTS & financiers', 'STOCK options', 'MUNICIPAL corporations', 'MARKETING', 'SERVICE industries -- Management', 'ENTREPRENEURSHIP', 'UNITED States -- National Guard', 'INDIVIDUAL differences', 'BUSINESS communication', 'LABOR supply', 'STOCKS (Finance)', 'INDUSTRIAL relations', 'ORGANIZATIONAL behavior', 'GLOBALIZATION', 'PRODUCT design', 'SUCCESS in business', 'SOCIAL influence', 'INDUSTRIAL efficiency', 'DELEGATION of authority', 'DECISION making', 'TECHNOLOGICAL innovations', 'VIOLENCE', 'CONFLICT management', 'WOMEN employees', 'MASS media', 'JOB satisfaction', 'STOCKS (Finance) -- Prices', 'RISK management in business', 'OPTIONS (Finance)', 'EMPLOYEE stock options', 'INDUSTRIAL organization', 'CORPORATIONS -- Public relations', 'QUALITY of work life', 'BUSINESS enterprises -- Valuation', 'COMPETITIVE advantage', 'WORK environment', 'MULTILEVEL marketing', 'MANAGEMENT research', 'EXECUTIVES', 'FOREIGN investments', 'SOCIAL context', 'SELF-management (Psychology)', 'WORK attitudes', 'MENTAL fatigue', 'PRODUCT management', 'BUSINESS planning', 'ORGANIZATIONAL structure', 'BUSINESS models', 'ORGANIZATIONAL goals', 'HIGH technology', 'MATHEMATICAL statistics', 'CUSTOMER orientation', 'INFRASTRUCTURE (Economics)', 'MANAGEMENT styles', 'CORPORATE image', 'LABOR turnover', 'PUNCTUATED equilibrium (Evolution)', 'TECHNOLOGICAL innovations -- Economic aspects', 'PUBLIC companies', 'BUSINESS networks', 'CORPORATIONS -- Investor relations', 'EMPLOYEE rules', 'FOREIGN subsidiaries -- Management', 'CUSTOMER relations', 'META-analysis', 'SOCIAL capital (Sociology)', 'HOSPITALS -- Administration', 'CUSTOMER services', 'STOCKHOLDERS', 'SOCIAL status', 'PSYCHOMETRICS', 'EMPLOYEE motivation', 'PEER review (Professional performance)', 'RESOURCE management', 'PENSION trusts', 'PRODUCTION management', 'PERFORMANCE', 'TASK analysis', 'CONSOLIDATION & merger of corporations', 'INSTITUTIONAL investors', 'ORGANIZATIONAL commitment', 'HUMAN error', 'STRESS (Psychology)', 'JOB performance', 'PERSONNEL changes', 'MANAGEMENT science', 'SCREENWRITERS', 'COMMERCIAL products', 'DATA mining', 'HOST countries (Business)', 'SOCIAL interaction', 'BREAK-even analysis', 'SELF-perception', 'OCCUPATIONAL roles', 'TRANSACTION costs', 'CHIEF executive officers', 'EMPLOYEES -- Attitudes -- Research', 'CUSTOMER satisfaction', 'INNOVATION management', 'INTELLECTUAL capital', 'EMOTIONS (Psychology)', 'PRODUCT lines', 'INTERNATIONAL business enterprises -- Management', 'INDUSTRIAL psychology', 'EXECUTIVE ability (Management)', 'PYGMALION (Greek mythology)', 'BEHAVIORAL research', 'PERFORMANCE standards', 'COMPENSATION management', 'SELF-congruence', 'DECENTRALIZATION in management', 'EXECUTIVE succession', 'RESEARCH & development contracts', 'PROBLEM employees', 'AGENCY theory', 'ORGANIZATIONAL change', 'GROUP decision making', 'ORGANIZATIONAL sociology', 'EMPLOYEES', 'QUALITY of products', 'STOCKHOLDERS -- Attitudes', 'CRITICAL thinking', 'CRITICAL incident technique', 'CORPORATE governance', 'REWARD (Psychology)', 'ORGANIZATIONAL effectiveness', 'INTRINSIC motivation', 'DIVERSIFICATION in industry', 'WORK environment -- Psychological aspects', 'DEBT', 'EXECUTIVE compensation', 'EMPLOYEE recruitment', 'SOCIAL judgment theory (Communication)', 'STOCKHOLDERS wealth', 'ORGANIZATIONAL research', 'SUCCESSION planning', 'CROSS-cultural differences', 'BURNOUT (Psychology)', 'GALATEA, sea nymph (Greek deity)', 'ANGER in the workplace', 'GENEROSITY', 'CAPITAL investments', 'GOAL setting in personnel management', 'WORK & family', 'INCENTIVES in industry', 'CAPITAL market', 'EMPLOYEE selection', 'CONTAGION (Social psychology)', 'BUSINESS enterprises', 'INTERORGANIZATIONAL networks', 'INFORMATION resources management', 'LABOR process', 'CORPORATIONS -- Finance', 'CORPORATE culture', 'TURNOVER (Business)', 'EXECUTIVES -- Dismissal of', 'WAGE payment systems', 'SOCIAL factors', 'MANAGEMENT information systems', 'STOCK ownership', 'INTERNATIONAL business enterprises', 'CORPORATIONS -- Valuation', 'TAIWANESE', 'MARKETING -- Decision making', 'KNOWLEDGE management', 'GROUP identity', 'INNOVATIONS in business', 'EXECUTIVES -- Recruiting', 'SOCIAL networks', 'SUPPLIERS', 'CREATIVE ability in business', 'INTERGROUP relations', 'STRATEGIC alliances (Business)', 'INVESTMENTS', 'INTERORGANIZATIONAL relations', 'RESOURCE-based theory of the firm', 'LABOR economics', 'EMPLOYEES -- Rating of', 'HUMAN resource accounting', 'FINANCIAL management', 'PROPERTY', 'EQUITY', 'CONTINGENCY theory (Management)', 'NEW products', 'EMINENT domain', 'MANAGEMENT', 'EMPLOYMENT in foreign countries', 'SUBSIDIARY corporations -- Management', 'MARKETING strategy', 'DIRECTORS of corporations', 'VENTURE capital', 'INTERPERSONAL relations', 'CHARISMATIC authority', 'SOCIAL psychology', 'HUMAN capital', 'JOB qualifications', 'MEDIATION', 'LABOR organizing', 'STOCK repurchasing', 'FINANCIAL performance', 'MINORITY stockholders', 'PERFORMANCE evaluation', 'EMPLOYEE ownership', 'PRODUCT information management', 'GOING public (Securities)', 'FAMILY-owned business enterprises', 'WOMEN -- Employment', 'MOTION picture authorship', 'MANAGEMENT -- Employee participation', 'SHIPBUILDING industry', 'LEADERSHIP', 'INNOVATION adoption', 'EMPLOYEE loyalty', 'SUPPLY chains', 'AMBIVALENCE', 'WORKFLOW', 'WAGES', 'ERROR rates', 'JUSTICE', 'PROFIT', 'TEAMS in the workplace', 'JOB stress', 'DECISION theory', 'STRATEGIC planning', 'MARKETING management', 'LABOR productivity', 'MOTIVATION (Psychology)', 'STRATEGIC business units', 'ORGANIZATIONAL justice', 'CREATIVE ability', 'DIVISION of labor', 'EMPLOYEES -- Attitudes', 'PERSONNEL management']
In [8]:
#convert the dataframe into dict values by considering first column as keys
mapped_list = df_keyword.set_index('Title').agg(list, axis=1).to_dict()
mapped_list
Out[8]:
{'Feb/03': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
 'Meta-Analyses of Financial Performance and Equity: Fusion or Confusion?': ['EQUITY',
  'ORGANIZATIONAL sociology',
  'PERFORMANCE',
  'META-analysis',
  'PSYCHOMETRICS',
  'ORGANIZATIONAL research',
  'FINANCIAL performance',
  'AGENCY theory',
  'ORGANIZATIONAL effectiveness',
  'ORGANIZATIONAL behavior',
  'CORPORATE governance',
  nan],
 'Home Country Environments, Corporate Diversification Strategies, and Firm Performance': ['DIVERSIFICATION in industry',
  'BUSINESS planning',
  'PERFORMANCE standards',
  'EMPLOYEES -- Rating of',
  'CORPORATE culture',
  'STRATEGIC planning',
  'ORGANIZATIONAL effectiveness',
  'MANAGEMENT science',
  'MANAGEMENT research',
  'PRODUCT management',
  nan,
  nan],
 'Safeguarding Investments in Asymmetric Interorganizational Relationships: Theory and Evidence': ['INTERORGANIZATIONAL relations',
  'INTERGROUP relations',
  'BUSINESS communication',
  'INVESTMENTS',
  'SUPPLY chains',
  'KNOWLEDGE management',
  'INTERORGANIZATIONAL networks',
  'CORPORATE governance',
  'GROUP decision making',
  'INTELLECTUAL capital',
  nan,
  nan],
 'Managerialist and Human Capital Explanations for Key Executive Pay Premiums: A Contingency Perspective': ['EXECUTIVE compensation',
  'WAGES',
  'HUMAN capital',
  'LABOR economics',
  'PERSONNEL management',
  'MANAGEMENT science',
  'CONTINGENCY theory (Management)',
  'COMPENSATION management',
  'EXECUTIVE ability (Management)',
  'CORPORATE governance',
  nan,
  nan],
 'Bidding Wars Over R&D-Intensive Firms: Knowledge, Opportunism, and the Market for Corporate Control': ['KNOWLEDGE management',
  'INFORMATION resources management',
  'MANAGEMENT information systems',
  'BREAK-even analysis',
  'DATA mining',
  'MANAGEMENT science',
  'RESEARCH & development',
  'RESEARCH & development contracts',
  'CORPORATE governance',
  'DECISION making',
  'ORGANIZATIONAL behavior',
  'TRANSACTION costs'],
 'When “The Show Must Go On”: Surface Acting and Deep Acting as Determinants of Emotional Exhaustion and Peer-Rated Service;Delivery': ['EMOTIONS (Psychology)',
  'INTERPERSONAL relations',
  'STRESS (Psychology)',
  'SOCIAL interaction',
  'SOCIAL psychology',
  'EMPLOYEES -- Attitudes',
  'CUSTOMER services',
  'CUSTOMER satisfaction',
  'JOB stress',
  'PEER review (Professional performance)',
  nan,
  nan],
 "Relationships among Supervisors' and Subordinates' Procedural Justice Perceptions and Organizational Citizenship Behaviors": ['SUPERVISORS',
  'JUSTICE',
  'CONFLICT management',
  'MEDIATION',
  'EMPLOYEES',
  'INDUSTRIAL relations',
  'ORGANIZATIONAL behavior',
  'UNITED States -- National Guard',
  'ORGANIZATIONAL effectiveness',
  'DECISION making',
  'RESOURCE allocation',
  nan],
 'Punctuated Equilibrium and Linear Progression: Toward a New Understanding of Group Development': ['INDUSTRIAL relations',
  'MANAGEMENT science',
  'DECISION theory',
  'ORGANIZATIONAL sociology',
  'PUNCTUATED equilibrium (Evolution)',
  'ORGANIZATIONAL change',
  'ORGANIZATIONAL behavior',
  'ORGANIZATIONAL structure',
  'BUSINESS models',
  'ORGANIZATIONAL research',
  nan,
  nan],
 'Apr/03': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
 'The Relationship between Overconfidence and the Introduction of Risky Products: Evidence from a Field Study': ['DECISION making',
  'EXECUTIVES',
  'INDUSTRIAL management',
  'NEW products',
  'HIGH technology industries',
  nan,
  nan,
  nan,
  nan,
  nan,
  nan,
  nan],
 'Governance Through Ownership: Centuries of Practice, Decades of Research': ['CORPORATE governance',
  'INDUSTRIAL management',
  'STOCKHOLDERS wealth',
  'INSTITUTIONAL investors',
  'WAGES',
  'NEW products',
  'ORGANIZATIONAL structure',
  'ORGANIZATIONAL behavior',
  'DECENTRALIZATION in management',
  'ORGANIZATIONAL effectiveness',
  nan,
  nan],
 'Strategic Satisficing? A Behavioral-Agency Theory Perspective on Stock Repurchase Program Announcements': ['EXECUTIVES',
  'STOCKHOLDERS wealth',
  'STOCK repurchasing',
  'CORPORATIONS -- Finance',
  'INCENTIVES in industry',
  'CORPORATE governance',
  'STRATEGIC planning',
  'EXECUTIVE ability (Management)',
  'AGENCY theory',
  'ORGANIZATIONAL behavior',
  'ORGANIZATIONAL effectiveness',
  nan],
 'Exploring the Agency Consequences of Ownership Dispersion Among The Directors of Private Family Firms': ['FAMILY-owned business enterprises',
  'DEBT',
  'DIRECTORS of corporations',
  'AGENCY theory',
  'ORGANIZATIONAL behavior',
  'ORGANIZATIONAL structure',
  'EMPLOYEE ownership',
  'CORPORATE governance',
  'DECISION making',
  'BOARDS of directors',
  'INDUSTRIAL relations',
  nan],
 'Institutional Ownership Differences and International Diversification: The Effects of Boards of Directors and Technological;Opportunity': ['INSTITUTIONAL investors',
  'DIVERSIFICATION in industry',
  'BUSINESS planning',
  'GLOBALIZATION',
  'BOARDS of directors',
  'INTERNATIONAL business enterprises',
  'FOREIGN investments',
  'PENSION trusts',
  'HIGH technology',
  'STRATEGIC planning',
  'TECHNOLOGICAL innovations',
  'INNOVATION adoption'],
 'Ownership Structures and R&D Investments of U.S. and Japanese Firms: Agency and Stewardship Perspectives': ['RESEARCH & development',
  'INVESTMENTS',
  'PROPERTY',
  'INCENTIVES in industry',
  'AGENCY theory',
  'ORGANIZATIONAL sociology',
  'ORGANIZATIONAL structure',
  'STEWARDS',
  nan,
  nan,
  nan,
  nan],
 'The Determinants of Executive Compensation in Family-Controlled Public Corporations': ['FAMILY-owned business enterprises',
  'CHIEF executive officers',
  'EXECUTIVE compensation',
  'BUSINESS enterprises',
  'RISK',
  'MUNICIPAL corporations',
  'CORPORATE governance',
  'RESEARCH & development',
  'ORGANIZATIONAL behavior',
  'ORGANIZATIONAL structure',
  nan,
  nan],
 'Ownership Structure, Expropriation, and Performance of Group-Affiliated Companies in Korea': ['PROPERTY',
  'PERFORMANCE',
  'STOCKHOLDERS',
  'PROFIT',
  'MINORITY stockholders',
  'EMINENT domain',
  'ORGANIZATIONAL effectiveness',
  'ORGANIZATIONAL structure',
  'CORPORATE governance',
  nan,
  nan,
  nan],
 'CEO Stock Options: The Silent Dimension of Ownership': ['STOCK options',
  'STOCKS (Finance)',
  'CHIEF executive officers',
  'STOCK ownership',
  'EXECUTIVE compensation',
  'EMPLOYEE stock options',
  'ORGANIZATIONAL structure',
  'ORGANIZATIONAL effectiveness',
  'DECISION making',
  'RISK management in business',
  nan,
  nan],
 'Jun/03': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
 'Assessing Creativity in Hollywood Pitch Meetings: Evidence for a Dual-Process Model of Creativity Judgments': ['MANAGEMENT science',
  'DECISION making',
  'SCREENWRITERS',
  'CREATIVE ability',
  'CREATIVE ability in business',
  'SOCIAL judgment theory (Communication)',
  'MOTION picture authorship',
  'SELF-perception',
  'ORGANIZATIONAL behavior',
  'QUALITY of products',
  nan,
  nan],
 'Reactions to Perceived Inequity in U.S. and Dutch Interorganizational Relationships': ['INTERORGANIZATIONAL relations',
  'INDUSTRIAL organization',
  'ORGANIZATIONAL behavior',
  'ORGANIZATIONAL effectiveness',
  'INTERGROUP relations',
  'ORGANIZATIONAL structure',
  'BUSINESS networks',
  'SUPPLIERS',
  'STRATEGIC alliances (Business)',
  nan,
  nan,
  nan],
 "The Impact of Community Violence and an Organization's Procedural Justice Climate on Workplace Aggression": ['AGGRESSION (Psychology)',
  'VIOLENCE',
  'SOCIAL psychology',
  'ORGANIZATIONAL justice',
  'WORK environment',
  'INDUSTRIAL relations',
  'MANAGEMENT science',
  'VIOLENCE in the workplace',
  'ANGER in the workplace',
  'EMPLOYEES -- Attitudes',
  'PROBLEM employees',
  'WORK attitudes'],
 'Explaining New CEO Origin: Firm Versus Industry Antecedents': ['CHIEF executive officers',
  'PERSONNEL changes',
  'SUCCESSION planning',
  'EXECUTIVE succession',
  'MANAGEMENT science',
  'EXECUTIVES -- Recruiting',
  'STRATEGIC planning',
  'MANAGEMENT research',
  'EXECUTIVE ability (Management)',
  'JOB qualifications',
  'ORGANIZATIONAL change',
  nan],
 'Do High Job Demands Increase Intrinsic Motivation or Fatigue or Both? The Role of Job Control and Job Social Support': ['MENTAL fatigue',
  'JOB stress',
  'INDUSTRIAL psychology',
  'BURNOUT (Psychology)',
  'SOCIAL networks',
  'PERSONNEL management',
  'MANAGEMENT science',
  'MOTIVATION (Psychology)',
  'INTRINSIC motivation',
  'JOB qualifications',
  'ORGANIZATIONAL behavior',
  'ORGANIZATIONAL effectiveness'],
 'Organizational Hiring Patterns, Interfirm Network Ties, and Interorganizational Imitation': ['PERSONNEL management',
  'PERSONNEL changes',
  'MANAGEMENT science',
  'INTERORGANIZATIONAL relations',
  'CONTAGION (Social psychology)',
  'TEAMS in the workplace',
  'EXECUTIVES -- Recruiting',
  'EMPLOYEE recruitment',
  'ORGANIZATIONAL sociology',
  'BUSINESS networks',
  'INTERORGANIZATIONAL networks',
  nan],
 'The Effects of Centrifugal and Centripetal Forces on Product Development Speed and Quality: How Does Problem Solving Matter?': ['PRODUCT management',
  'NEW products',
  'PROBLEM solving',
  'QUALITY of products',
  'DECENTRALIZATION in management',
  'MARKETING management',
  'MANAGEMENT science',
  'PRODUCT design',
  'PRODUCT lines',
  'PRODUCT information management',
  'ORGANIZATIONAL behavior',
  nan],
 'A Social Capital Model of High-Growth Ventures': ['SOCIAL capital (Sociology)',
  'INFRASTRUCTURE (Economics)',
  'VENTURE capital',
  'INVESTMENTS',
  'GOING public (Securities)',
  'COMPETITIVE advantage',
  'ENTREPRENEURSHIP',
  'CAPITAL market',
  'RESOURCE management',
  'ORGANIZATIONAL effectiveness',
  nan,
  nan],
 'Aug/03': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
 'Transforming Work-Family Conflict into Commitment in Network Marketing Organizations': ['ORGANIZATIONAL behavior',
  'MULTILEVEL marketing',
  'ORGANIZATIONAL commitment',
  'MARKETING management',
  'QUALITY of work life',
  'JOB satisfaction',
  'AMBIVALENCE',
  'ORGANIZATIONAL structure',
  'ORGANIZATIONAL effectiveness',
  'ORGANIZATIONAL sociology',
  nan,
  nan],
 'Advocacy, Performance, and Threshold Influences on Decisions to Terminate New Product Development': ['NEW products',
  'PERFORMANCE evaluation',
  'COMMERCIAL products',
  'PRODUCT management',
  'MARKETING',
  'DECISION making',
  'MARKETING -- Decision making',
  'RESEARCH & development',
  'STRATEGIC planning',
  'PRODUCT design',
  nan,
  nan],
 'Managing from the Boundary: The Effective Leadership of Self-Managing Work Teams': ['LEADERSHIP',
  'TEAMS in the workplace',
  'STRATEGIC planning',
  'SELF-management (Psychology)',
  'MANAGEMENT -- Employee participation',
  'CRITICAL incident technique',
  'TASK analysis',
  'MANAGEMENT science',
  'EXECUTIVE ability (Management)',
  'DECISION making',
  nan,
  nan],
 'Team Member Functional Background and Involvement in Management Teams: Direct Effects and the Moderating Role of Power Centralization': ['TEAMS in the workplace',
  'DECISION making',
  'CRITICAL thinking',
  'WORKFLOW',
  'MANAGEMENT',
  'DECENTRALIZATION in management',
  'MANAGEMENT science',
  'ORGANIZATIONAL behavior',
  'DELEGATION of authority',
  'GROUP decision making',
  'STRATEGIC business units',
  nan],
 'Happy Together? How Using Nonstandard Workers Affects Exit, Voice, and Loyalty Among Standard Employees': ['LABOR supply',
  'LABOR organizing',
  'CONDUCT of life',
  'ORGANIZATIONAL behavior',
  'EMPLOYEE loyalty',
  'ORGANIZATIONAL commitment',
  'INDUSTRIAL relations',
  'ORGANIZATIONAL structure',
  'EMPLOYEES -- Attitudes',
  'PERSONNEL management',
  nan,
  nan],
 'Interpersonal Aggression in Work Groups: Social Influence, Reciprocal, and Individual Effects': ['EMPLOYEES -- Attitudes',
  'AGGRESSION (Psychology)',
  'TEAMS in the workplace',
  'SOCIAL influence',
  'INDIVIDUAL differences',
  'INTERPERSONAL relations',
  'SOCIAL context',
  'ORGANIZATIONAL behavior',
  'ORGANIZATIONAL structure',
  'WORK environment',
  nan,
  nan],
 'Share Price Reactions to Work-Family Initiatives: An Institutional Perspective': ['WORK & family',
  'PERSONNEL management',
  'STOCKHOLDERS',
  'WOMEN employees',
  'STOCKS (Finance) -- Prices',
  'MANAGEMENT science',
  'HUMAN resource accounting',
  'WOMEN -- Employment',
  'ORGANIZATIONAL behavior',
  'QUALITY of work life',
  nan,
  nan],
 'The Role of Human Capital in Postacquisition CEO Departure': ['HUMAN capital',
  'CHIEF executive officers',
  'CAPITAL investments',
  'LABOR economics',
  'CONSOLIDATION & merger of corporations',
  'EXECUTIVES -- Dismissal of',
  'ORGANIZATIONAL effectiveness',
  'ORGANIZATIONAL behavior',
  'LABOR turnover',
  'EXECUTIVE succession',
  nan,
  nan],
 'Oct/03': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
 'How Much Should I Give and How Often? The Effects of Generosity and Frequency of Favor Exchange on Social Status and Productivity': ['SOCIAL status',
  'GENEROSITY',
  'BEHAVIORAL research',
  'LABOR productivity',
  'SOCIAL exchange',
  'INTERPERSONAL relations',
  'SOCIAL factors',
  'EMPLOYEES -- Attitudes -- Research',
  'PERSONNEL management',
  'ORGANIZATIONAL behavior',
  nan,
  nan],
 'Self-Concordance at Work: Toward Understanding the Motivational Effects of Transformational Leaders': ['LEADERSHIP',
  'EXECUTIVE ability (Management)',
  'EMPLOYEE motivation',
  'MOTIVATION (Psychology)',
  'INDUSTRIAL psychology',
  'MANAGEMENT science',
  'JOB satisfaction',
  'CHARISMATIC authority',
  'SELF-congruence',
  'MANAGEMENT styles',
  nan,
  nan],
 'Cooperation, Competition, and Team Performance: Toward a Contingency Approach': ['EMPLOYEE motivation',
  'JOB performance',
  'TEAMS in the workplace',
  'INDUSTRIAL management',
  'PERSONNEL management',
  'ORGANIZATIONAL sociology',
  'INCENTIVES in industry',
  'INDUSTRIAL psychology',
  'GOAL setting in personnel management',
  'REWARD (Psychology)',
  nan,
  nan],
 'The Impact Of Expectations On Newcomer Performance In Teams As Mediated By Work Characteristics, Social Exchanges, And Empowerment': ['TEAMS in the workplace',
  'ORGANIZATIONAL sociology',
  'EMPLOYEE motivation',
  'LEADERSHIP',
  'INTERPERSONAL relations',
  'INDUSTRIAL management',
  'PYGMALION (Greek mythology)',
  'GALATEA, sea nymph (Greek deity)',
  'SOCIAL exchange',
  'OCCUPATIONAL roles',
  nan,
  nan],
 'THe Effects of Discontinuous Change on Latent Errors in Organizations: The Moderating Role of Risk': ['ORGANIZATIONAL change',
  'EMPLOYEE rules',
  'HUMAN error',
  'RISK',
  'INDUSTRIAL management',
  'PERSONNEL management',
  'ORGANIZATIONAL behavior',
  'INDUSTRIAL psychology',
  'ORGANIZATIONAL research',
  'ERROR rates',
  nan,
  nan],
 'Employee Creativity in Taiwan: An Application of Role Identity Theory': ['CREATIVE ability',
  'TAIWANESE',
  'EMPLOYEES',
  'PERSONNEL management',
  'EMPLOYEE motivation',
  'CREATIVE ability in business',
  'INNOVATION management',
  'CROSS-cultural differences',
  nan,
  nan,
  nan,
  nan],
 'Media Legitimation Effects in the Market for Initial Public Offerings': ['GOING public (Securities)',
  'CORPORATE image',
  'STOCKHOLDERS -- Attitudes',
  'CAPITALISTS & financiers',
  'MASS media',
  'CORPORATIONS -- Investor relations',
  'MATHEMATICAL statistics',
  'CORPORATIONS -- Public relations',
  'PUBLIC companies',
  'TURNOVER (Business)',
  nan,
  nan],
 'Giving Money to Get Money: How CEO Stock Options and CEO Equity Enhance IPO Valuations': ['STOCK options',
  'GOING public (Securities)',
  'INCENTIVES in industry',
  'OPTIONS (Finance)',
  'CORPORATIONS -- Valuation',
  'CORPORATIONS -- Finance',
  'EXECUTIVE compensation',
  'CAPITALISTS & financiers',
  'BUSINESS enterprises -- Valuation',
  'DECISION making',
  nan,
  nan],
 'Dec/03': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
 'A Behavioral Theory of R&D Expenditures and Innovations: Evidence from Shipbuilding': ['ORGANIZATIONAL behavior',
  'CORPORATIONS -- Finance',
  'RESEARCH & development',
  'INDUSTRIAL management',
  'INNOVATIONS in business',
  'INNOVATION management',
  'BUSINESS planning',
  'SHIPBUILDING industry',
  'TECHNOLOGICAL innovations -- Economic aspects',
  'SUCCESS in business',
  'COMPETITIVE advantage',
  'ORGANIZATIONAL change'],
 'Transformational Leadership, Conservation, and Creativity: Evidence From Korea': ['LEADERSHIP',
  'ORGANIZATIONAL behavior',
  'CREATIVE ability in business',
  'EMPLOYEE motivation',
  'ORGANIZATIONAL change',
  'WORK environment -- Psychological aspects',
  'MANAGEMENT',
  'EXECUTIVE ability (Management)',
  'INTRINSIC motivation',
  'INDUSTRIAL relations',
  'INDIVIDUAL differences',
  nan],
 'Informational Dissimilarity and Organizational Citizenship Behavior: The Role of Intrateam Interdependence and Team Identification': ['ORGANIZATIONAL behavior',
  'TEAMS in the workplace',
  'INDUSTRIAL psychology',
  'ORGANIZATIONAL effectiveness',
  'ORGANIZATIONAL goals',
  'ORGANIZATIONAL sociology',
  'SOCIAL psychology',
  'MANAGEMENT',
  'ORGANIZATIONAL change',
  'DIVISION of labor',
  'INDUSTRIAL organization',
  'WORK environment'],
 'Subsidiary Staffing in Multinational Enterprises: Agency, Resources, and Performance': ['INTERNATIONAL business enterprises -- Management',
  'FOREIGN subsidiaries -- Management',
  'EMPLOYEE selection',
  'EXECUTIVES -- Recruiting',
  'ORGANIZATIONAL sociology',
  'ORGANIZATIONAL behavior',
  'AGENCY theory',
  'RESOURCE-based theory of the firm',
  'PERSONNEL management',
  'EMPLOYMENT in foreign countries',
  'SUBSIDIARY corporations -- Management',
  'HOST countries (Business)'],
 'Strategic Human Resource Practices, Top Management Team Social Networks, and Firm Performance: The Role of Human Resource;Practices in Creating Organizational Competitive Advantage': ['PERSONNEL management',
  'COMPETITIVE advantage',
  'BUSINESS networks',
  'INDUSTRIAL management',
  'STRATEGIC planning',
  'SOCIAL networks',
  'RESOURCE management',
  'RESOURCE-based theory of the firm',
  'HUMAN capital -- Management',
  'INTELLECTUAL capital',
  'DECISION making',
  'INDUSTRIAL efficiency'],
 'Compensation Policy and Organizational Performance: The Efficiency, Operational, and Financial Implications of Pay Levels;and Pay Structure': ['COMPENSATION management',
  'ORGANIZATIONAL behavior',
  'PERSONNEL management',
  'HOSPITALS -- Administration',
  'MANAGEMENT',
  'FINANCIAL performance',
  'WAGE payment systems',
  'RESOURCE management',
  'ORGANIZATIONAL effectiveness',
  'INDUSTRIAL efficiency',
  'FINANCIAL management',
  'INDUSTRIAL management'],
 'Functional Background Identity, Diversity, and Individual Performance in Cross-Functional Teams': ['CROSS-functional teams',
  'TEAMS in the workplace',
  'GROUP identity',
  'ORGANIZATIONAL behavior',
  'MANAGEMENT',
  'PERFORMANCE',
  'PERSONNEL management',
  'COMPETITIVE advantage',
  'ORGANIZATIONAL effectiveness',
  'GROUP decision making',
  'ORGANIZATIONAL structure',
  'ORGANIZATIONAL sociology'],
 'A Customer Interaction Approach to Strategy and Production Complexity Alignment in Service Firms': ['SERVICE industries -- Management',
  'CUSTOMER relations',
  'INDUSTRIAL management',
  'PRODUCTION management',
  'STRATEGIC planning',
  'CUSTOMER services',
  'LABOR process',
  'ORGANIZATIONAL behavior',
  'DECISION making',
  'CUSTOMER satisfaction',
  'CUSTOMER orientation',
  'MARKETING strategy']}
  1. Create the adjacency matrix
In [9]:
#find the lengthe of unique values
len_keys = len(unique_val) 
#create the nxn zeros matrix with length of unique keywords 
adj_matrix = np.zeros((len_keys, len_keys), dtype = int)

#Creating the adjacency matrix by mapping the unique values with the mapped list
for row in range(0, len_keys):
  for col in range(0, len_keys):
    if row != col :
      if (adj_matrix[row][col] == 0) and (adj_matrix[col][row] == 0) :
        for title in mapped_list.keys():
          if (unique_val[row] in (mapped_list[title])) and (unique_val[col] in (mapped_list[title])):
            adj_matrix[row][col] = adj_matrix[row][col] + 1
            adj_matrix[col][row] = adj_matrix[col][row] + 1 

adj_matrix
Out[9]:
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 1, 1],
       [0, 0, 0, ..., 0, 0, 1],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 1, 0, ..., 0, 0, 1],
       [0, 1, 1, ..., 0, 1, 0]])
In [16]:
#plotting network graph with the unique keywords and adjacency matrix
network = nx.from_numpy_matrix(adj_matrix, parallel_edges=False,)
plt.figure(3,figsize=(10,6)) 
nx.draw(network,node_color='#6b5b95', node_size=400, font_color='whitesmoke')
plt.show()
In [17]:
#3,4,5 convert the adj matrix to weighted network, degree
print("No of Nodes in the Network ",network.number_of_nodes() )
print("No of Edges in the Network ",network.number_of_edges() )
df_net_degree = pd.DataFrame(network.degree, columns=['No_Of_Nodes', 'No_Of_Degree'])
df_net_degree['Keywords'] = unique_val
df_final_degree = df_net_degree.sort_values(by=['No_Of_Degree'],ascending=False)
#top 10 rows with highest degree
df_final_degree.head(10)
No of Nodes in the Network  248
No of Edges in the Network  2141
Out[17]:
No_Of_Nodes No_Of_Degree Keywords
27 27 166 ORGANIZATIONAL behavior
140 140 104 ORGANIZATIONAL effectiveness
100 100 102 MANAGEMENT science
247 247 93 PERSONNEL management
34 34 90 DECISION making
61 61 74 ORGANIZATIONAL structure
238 238 66 STRATEGIC planning
132 132 66 ORGANIZATIONAL sociology
14 14 64 INDUSTRIAL management
138 138 62 CORPORATE governance
In [18]:
#3,4,5 compute the node strength
df_net_strength = pd.DataFrame(network.degree(weight='weight'), columns=['No_Of_Nodes', 'Strength'])
df_net_strength['Keywords'] = unique_val
df_final_strength = df_net_strength.sort_values(by=['Strength'],ascending=False)
#top 10 rows with highest strength
df_final_strength.head(10)
Out[18]:
No_Of_Nodes Strength Keywords
27 27 265 ORGANIZATIONAL behavior
140 140 144 ORGANIZATIONAL effectiveness
100 100 136 MANAGEMENT science
247 247 126 PERSONNEL management
34 34 112 DECISION making
61 61 107 ORGANIZATIONAL structure
132 132 96 ORGANIZATIONAL sociology
138 138 85 CORPORATE governance
14 14 84 INDUSTRIAL management
238 238 80 STRATEGIC planning
In [19]:
#6 compute all pairs of weight
lis = []
for i in range(0,len(unique_val)):
  for j in range(0,len(unique_val)):
    lis.append([df_net_strength['Keywords'][i], df_net_strength['Keywords'][j], adj_matrix[i][j]])
df_net_weights = pd.DataFrame(lis, columns =['keyword1', 'keyword2', 'Weights'], dtype = float) 
df_final_weights = df_net_weights.sort_values(by=['Weights'],ascending=False)
#top 10 values from the dataframe
df_final_weights.head(10)
Out[19]:
keyword1 keyword2 Weights
34747 ORGANIZATIONAL effectiveness ORGANIZATIONAL behavior 11.0
6836 ORGANIZATIONAL behavior ORGANIZATIONAL effectiveness 11.0
6757 ORGANIZATIONAL behavior ORGANIZATIONAL structure 9.0
15155 ORGANIZATIONAL structure ORGANIZATIONAL behavior 9.0
61283 PERSONNEL management ORGANIZATIONAL behavior 8.0
6943 ORGANIZATIONAL behavior PERSONNEL management 8.0
6796 ORGANIZATIONAL behavior MANAGEMENT science 7.0
24827 MANAGEMENT science ORGANIZATIONAL behavior 7.0
34251 CORPORATE governance ORGANIZATIONAL behavior 6.0
6730 ORGANIZATIONAL behavior DECISION making 6.0
In [20]:
#7. Plot average strength on y-axis and degree on x-axis
sns.lineplot(x=df_net_degree['No_Of_Degree'], y=df_net_strength['Strength'])
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1ef0547af0>
In [21]:
#using plotly library
fig = px.line(x=df_net_degree['No_Of_Degree'], y=df_net_strength['Strength'], title='Strength vs Degree')
fig.show()

Task - 2

In [184]:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from cleantext import clean
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
STOPWORDS = list(set(stopwords.words('english')))
extra_stopwords = ['@','$','.',',','!',"'s",'i','I','&','%','*','"','#','(',')','[',']','{','}','/','?','<','>','`','~','-','_','+','=',' & ',' &','& ','the','3','it','its',"it's",'this','that.','that','...','we','It','yes','no','if','we','If','We',';',':' ' (',')','[',']','{','}','m','t','d']
stop_words = STOPWORDS + extra_stopwords
print(stop_words)
['yours', "you'll", "should've", 'you', 'each', 'aren', 'o', 'such', 'i', 'yourselves', 'but', 'weren', 'there', 't', 'with', 'below', 'itself', 'ma', 'its', 'himself', 'he', 'when', 'll', 'was', 'above', 'or', 'this', 'are', 'than', 'just', 'their', 'myself', 'being', 'having', 'after', 'whom', "you're", 'should', 'those', 'hadn', 'couldn', "haven't", 'it', 'has', 'does', 'nor', "she's", "shouldn't", 'won', 'my', 'wouldn', 'hers', "isn't", 'where', 'some', 'if', 'what', 'off', 'doing', 'herself', 'only', 'mustn', 'didn', 're', 'not', 'y', 'against', "doesn't", 'during', 'while', 'once', 'd', 'will', "couldn't", "hasn't", 'your', 'them', 'needn', 'ain', 'out', 'up', 'we', 'am', 'more', 'me', 'by', 'our', 'all', 'haven', 'shan', 'few', 'too', 'ours', 'in', 'can', 'were', "mustn't", 'before', 'him', 'his', 'm', 'until', "hadn't", "weren't", 's', 'doesn', 'at', 'theirs', 'most', 'of', 'had', "you've", 'any', 'so', "wouldn't", 'that', 'under', 'yourself', 'through', 'is', 'have', 'for', 'down', "didn't", 'an', 'both', 'same', "shan't", 'between', 'to', 'the', 'very', 'on', 'wasn', 'further', "wasn't", 'do', 'ourselves', 'now', "mightn't", 'no', 'mightn', "aren't", 'been', 'about', 'as', 'she', 'here', 'from', "that'll", 'these', 'into', 'they', "it's", "don't", "you'd", 'hasn', 'a', 'don', 'did', "needn't", 'who', 'shouldn', 'other', 'her', 'how', 'be', 'because', 'which', 'own', 'over', 'then', 'why', 'and', 've', 'again', 'isn', 'themselves', "won't", '@', '$', '.', ',', '!', "'s", 'i', 'I', '&', '%', '*', '"', '#', '(', ')', '[', ']', '{', '}', '/', '?', '<', '>', '`', '~', '-', '_', '+', '=', ' & ', ' &', '& ', 'the', '3', 'it', 'its', "it's", 'this', 'that.', 'that', '...', 'we', 'It', 'yes', 'no', 'if', 'we', 'If', 'We', ';', ': (', ')', '[', ']', '{', '}', 'm', 't', 'd']
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
In [185]:
#Mounting files from google drive
from google.colab import drive
drive.mount('/content/drive/')
%cd /content/drive/MyDrive/FDA_Project_3/tweet_datasets
#Changing directory for files and converting the file names to list
dir = '/content/drive/MyDrive/FDA_Project_3/tweet_datasets'
#It gives us the list of all the files present in the current directory
files_dir_list = sorted(os.listdir(dir))
print(pd.DataFrame(files_dir_list,columns=['File_Names']))
Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
/content/drive/MyDrive/FDA_Project_3/tweet_datasets
   File_Names
0    2010.csv
1    2011.csv
2    2012.csv
3    2013.csv
4    2014.csv
5    2015.csv
6    2016.csv
7    2017.csv
8    2018.csv
9    2019.csv
10   2020.csv
11   2021.csv
12   2022.csv
In [186]:
#getting the all years data into one dataframe
df_tweets_tempdata = pd.concat((pd.read_csv(f) for f in files_dir_list), ignore_index=True)
df_tweets_tempdata.head(10)
Out[186]:
Unnamed: 0 id conversation_id created_at date timezone place tweet language hashtags ... reply_to retweet_date translate trans_src trans_dest time mentions replies_count retweets_count likes_count
0 0.0 15434727182 15434727182 1275676317000.0 2010-06-04 18:31:57 0 NaN Please ignore prior tweets, as that was someon... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 0.0 152153637639028736 152151847614943233 1325111228000.0 2011-12-28 22:27:08 0 NaN @TheOnion So true :) en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1.0 151809315026636800 151809315026636800 1325029135000.0 2011-12-27 23:38:55 0 NaN If you ever wanted to know the *real* truth ab... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2.0 151338939389706242 151338939389706242 1324916990000.0 2011-12-26 16:29:50 0 NaN Walked around a neighborhood recently rebuilt ... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 3.0 151337237429239808 151337237429239808 1324916584000.0 2011-12-26 16:23:04 0 NaN It was Xmas, so we brought presents for the ki... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 4.0 151327734843445249 151327734843445249 1324914318000.0 2011-12-26 15:45:18 0 NaN Met with UNICEF, Doctors Without Borders and A... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 5.0 151322293174419456 151322293174419456 1324913020000.0 2011-12-26 15:23:40 0 NaN Just returned from a trip to Haiti. Covered a ... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 6.0 151317672217419777 151317672217419777 1324911919000.0 2011-12-26 15:05:19 0 NaN Single character Tweets are the ulitmate exten... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 7.0 151151777662779392 151151777662779392 1324872366000.0 2011-12-26 04:06:06 0 NaN I und [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 8.0 150390624552615937 150390624552615937 1324690893000.0 2011-12-24 01:41:33 0 NaN The Russians are having some challenges with t... en [] ... [] NaN NaN NaN NaN NaN NaN NaN NaN NaN

10 rows × 44 columns

In [187]:
#extracting only required columns from whole dataset
df_tweets_data = df_tweets_tempdata[['id','created_at','date','tweet','language','username','nlikes','nreplies','nretweets']]
df_tweets_data['date'] = pd.to_datetime(df_tweets_data['date'])
df_tweets_data['year'] = pd.DatetimeIndex(df_tweets_data['date']).year
df_tweets_data.dtypes
df_tweets_data['tweet']
Out[187]:
0        Please ignore prior tweets, as that was someon...
1                                     @TheOnion So true :)
2        If you ever wanted to know the *real* truth ab...
3        Walked around a neighborhood recently rebuilt ...
4        It was Xmas, so we brought presents for the ki...
                               ...                        
34873                              https://t.co/LA9hPzVlGx
34874                  Let’s make the roaring 20’s happen!
34875                  Great work by Tesla team worldwide!
34876                                    @BLKMDL3 @Tesla 🔥
34877                    @MiFSDBetaTester @WholeMarsBlog 🤣
Name: tweet, Length: 34878, dtype: object
In [188]:
#Consider the data only from year 2017 to 2022
#Filtering the data based on year [2017,2018,2019,2020,2021,2022]
list_of_years = [2017,2018,2019,2020,2021,2022]
df_tweets_data = df_tweets_data[df_tweets_data['year'].isin(list_of_years)]
print("Tweets count per year: ")
df_tweets_data.year.value_counts()
Tweets count per year: 
Out[188]:
2018    6861
2019    5789
2017    3483
2020    3330
2021    3115
2022    1028
Name: year, dtype: int64
In [189]:
#removing stopwords and emojis from the tweets
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet'].apply(lambda tweet: ' '.join([word for word in tweet.split() if word not in (stop_words)]))
df_tweets_data['tweet_without_stopwords'] = df_tweets_data.apply(lambda x: clean(x.tweet_without_stopwords, no_emoji = True), axis = 1)
df_tweets_data.drop(columns=['id','created_at','date'],axis=1,inplace=True)
df_tweets_data = df_tweets_data.reset_index(drop = True)
df_tweets_data
Out[189]:
tweet language username nlikes nreplies nretweets year tweet_without_stopwords
0 @neilsiegel @Tesla Coming very soon en elonmusk 2319.0 113.0 66.0 2017 @neilsiegel @tesla coming soon
1 @Kreative Vastly better maps/nav coming soon en elonmusk 2898.0 64.0 81.0 2017 @kreative vastly better maps/nav coming soon
2 @dd_hogan Ok und elonmusk 2707.0 29.0 91.0 2017 @dd_hogan ok
3 @Jason @Tesla Sure en elonmusk 4698.0 107.0 115.0 2017 @jason @tesla sure
4 @kabirakhtar Yeah, it’s terrible. Had to upgra... en elonmusk 3139.0 66.0 95.0 2017 @kabirakhtar yeah, it's terrible. had upgrade ...
... ... ... ... ... ... ... ... ...
23601 https://t.co/LA9hPzVlGx und elonmusk NaN NaN NaN 2022 https://t.co/la9hpzvlgx
23602 Let’s make the roaring 20’s happen! en elonmusk NaN NaN NaN 2022 let's make roaring 20's happen!
23603 Great work by Tesla team worldwide! en elonmusk NaN NaN NaN 2022 great work tesla team worldwide!
23604 @BLKMDL3 @Tesla 🔥 und elonmusk NaN NaN NaN 2022 @blkmdl3 @tesla
23605 @MiFSDBetaTester @WholeMarsBlog 🤣 und elonmusk NaN NaN NaN 2022 @mifsdbetatester @wholemarsblog

23606 rows × 8 columns

In [190]:
#1. Compute word frequencies for each year. Exclude the stop words
years = [2017,2018,2019,2020,2021,2022]
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace('&','')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace('@','')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace('the','')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace("it's",'')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace("this",'')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace("will",'')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace("would",'')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace("yes",'')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'].str.replace("that's",'')
df_tweets_data['tweet_without_stopwords'] = df_tweets_data['tweet_without_stopwords'] .replace(r'http\S+', '', regex=True).replace(r'www\S+', '', regex=True)

for year in years:
  df_tweets_data_year = df_tweets_data[df_tweets_data['year']==year]
  # Calculating the frequency of each word
  res = df_tweets_data_year['tweet_without_stopwords'].str.split(expand=True).stack().value_counts()
  df_word_freq_year = res.to_frame().reset_index()
  df_word_freq_year = df_word_freq_year.rename(columns= {'index': 'words',0: 'count'})
  print('Word frequnecy per year: ',year)
  print(df_word_freq_year.sort_values(by=['count'],ascending=False))
Word frequnecy per year:  2017
              words  count
0             tesla    219
1             model    177
2            spacex    147
3              like    132
4              next    126
...             ...    ...
2705  lollieshopmom      3
2706          stop!      3
2707           vary      3
2708       country,      3
4949     completed.      3

[4950 rows x 2 columns]
Word frequnecy per year:  2018
               words  count
0              tesla   1252
1              model    294
2                car    276
3               like    273
4               good    257
...              ...    ...
5140    advertisers.      3
5139  teddymonacelli      3
5138       mistrust.      3
8937            (due      2
8938         though)      2

[8939 rows x 2 columns]
Word frequnecy per year:  2019
               words  count
0              tesla   1102
1     erdayastronaut    520
2             spacex    411
3            flcnhvy    364
4      teslaownerssv    182
...              ...    ...
8539         actions      1
8540            all,      1
8541             mom      1
8542           wish,      1
8674       everyone.      1

[8675 rows x 2 columns]
Word frequnecy per year:  2020
                words  count
0               tesla    322
1      erdayastronaut    247
2             flcnhvy    239
3            ppathole    225
4              spacex    198
...               ...    ...
5587           hard).      1
5586        graduated      1
5585  party/hackathon      1
5584      (obviously)      1
9545           carlos      1

[9546 rows x 2 columns]
Word frequnecy per year:  2021
              words  count
0             tesla    306
1            spacex    224
2     wholemarsblog    189
3     teslaownerssv    119
4          ppathole    116
...             ...    ...
4993    experience!      1
4992      audiences      1
4991         10,000      1
4990            -15      1
8550       drivers,      1

[8551 rows x 2 columns]
Word frequnecy per year:  2022
                words  count
0               tesla    100
1       wholemarsblog     84
2              spacex     56
3       teslaownerssv     50
4       sawyermerritt     36
...               ...    ...
1068          nutaiie      2
1067         annoying      2
1066          diapers      2
1065            true.      2
2054  mifsdbetatester      2

[2055 rows x 2 columns]
In [140]:
#2. Show top 10 words (for each year) by the highest value of word frequency
for year in years:
  df_tweets_data_year = df_tweets_data[df_tweets_data['year']==year]
  # Calculating the frequency of each word
  res = df_tweets_data_year['tweet_without_stopwords'].str.split(expand=True).stack().value_counts()
  df_word_freq_year = res.to_frame().reset_index()
  df_word_freq_year = df_word_freq_year.rename(columns= {0: 'count'})
  df_word_freq_year = df_word_freq_year.rename(columns= {'index': 'words'})
  print('Top 10 word frequnecy per year: ',year)
  print(df_word_freq_year.head(10))
  print('\n')
  # who v/s fare barplot
  sns.set(rc={"figure.figsize":(23, 8)})
  plt.title(f'Top Words and frequencies for the year {year}', size = 16)
  sns.barplot(x = 'words',y = 'count',data = df_word_freq_year.head(20))
  # Show the plot
  plt.show()
  print('\n')
Top 10 word frequnecy per year:  2017
    words  count
0   tesla    219
1   model    177
2  spacex    147
3    like    132
4    next    126
5    good    117
6    just    117
7     one    117
8   first    108
9  boring    102



Top 10 word frequnecy per year:  2018
    words  count
0   tesla   1252
1   model    294
2     car    276
3    like    273
4    good    257
5  spacex    249
6   don't    216
7    even    195
8    next    189
9    make    177



Top 10 word frequnecy per year:  2019
            words  count
0           tesla   1102
1  erdayastronaut    520
2          spacex    411
3         flcnhvy    364
4   teslaownerssv    182
5       teslarati    175
6            like    170
7        starship    160
8           great    149
9            good    147



Top 10 word frequnecy per year:  2020
            words  count
0           tesla    322
1  erdayastronaut    247
2         flcnhvy    239
3        ppathole    225
4          spacex    198
5   thirdrowtesla    122
6   teslaownerssv    116
7           great    109
8            much    107
9            good    102



Top 10 word frequnecy per year:  2021
            words  count
0           tesla    306
1          spacex    224
2   wholemarsblog    189
3   teslaownerssv    119
4        ppathole    116
5  erdayastronaut    111
6            much     92
7            good     89
8            like     87
9           great     86



Top 10 word frequnecy per year:  2022
           words  count
0          tesla    100
1  wholemarsblog     84
2         spacex     56
3  teslaownerssv     50
4  sawyermerritt     36
5       billym2k     30
6       starlink     28
7         people     28
8           good     26
9   gailalfaratx     26



In [217]:
#3. Plot Histogram of word frequencies
for year in years:
  df_tweets_data_year = df_tweets_data[df_tweets_data['year']==year]
  # Calculating the frequency of each word
  res = df_tweets_data_year['tweet_without_stopwords'].str.split(expand=True).stack().value_counts()
  df_word_freq_year = res.to_frame().reset_index()
  df_word_freq_year = df_word_freq_year.rename(columns= {0: 'count'})
  df_word_freq_year = df_word_freq_year.rename(columns= {'index': 'words'})
  print('\n')
  print('Ploting histogram of word frequnecy per year: ',year)
  fig = sns.histplot(df_word_freq_year, x ='count', bins =25)
  plt.show()
  print('\n')

Ploting histogram of word frequnecy per year:  2017



Ploting histogram of word frequnecy per year:  2018



Ploting histogram of word frequnecy per year:  2019



Ploting histogram of word frequnecy per year:  2020



Ploting histogram of word frequnecy per year:  2021



Ploting histogram of word frequnecy per year:  2022

In [142]:
#4. Method-1 Use Zipf's law and plot log-log plots of word frequencies and rank for each year
for year in years:
  df_tweets_data_year = df_tweets_data[df_tweets_data['year']==year]
  # Calculating the frequency of each word
  res = df_tweets_data_year['tweet_without_stopwords'].str.split(expand=True).stack().value_counts()
  df_word_freq_year = res.to_frame().reset_index()
  df_word_freq_year = df_word_freq_year.rename(columns= {0: 'count'})
  df_word_freq_year = df_word_freq_year.rename(columns= {'index': 'words'})
  df_word_freq_year['rank'] = range(1,len(df_word_freq_year)+1)
  print('\n')
  print('Ploting log-log plots of word frequencies and rank for each year: ',year)
  plt.loglog(df_word_freq_year['count'], df_word_freq_year['rank'])
  plt.xlabel('Word Frequency')
  plt.ylabel('Rank')
  plt.show()

Ploting log-log plots of word frequencies and rank for each year:  2017

Ploting log-log plots of word frequencies and rank for each year:  2018

Ploting log-log plots of word frequencies and rank for each year:  2019

Ploting log-log plots of word frequencies and rank for each year:  2020

Ploting log-log plots of word frequencies and rank for each year:  2021

Ploting log-log plots of word frequencies and rank for each year:  2022
In [143]:
#4. Method-2 Use Zipf's law and plot log-log plots of word frequencies and rank for each year
import scipy.stats as ss
import math
# Show top 10 words (for each year) by the highest value of word frequency
for year in years:
  df_tweets_data_year = df_tweets_data[df_tweets_data['year']==year]
  # Calculating the frequency of each word
  res = df_tweets_data_year['tweet_without_stopwords'].str.split(expand=True).stack().value_counts()
  df_word_freq_year = res.to_frame().reset_index()
  df_word_freq_year = df_word_freq_year.rename(columns= {0: 'count'})
  df_word_freq_year = df_word_freq_year.rename(columns= {'index': 'words'})
  rank_word = ss.rankdata([freq for freq in df_word_freq_year['count'].values.tolist()])
  frequencies = [math.log(freq) for freq in df_word_freq_year['count'].values.tolist()]
  rank = [math.log(rank) for rank in rank_word]
  fig, ax = plt.subplots(figsize = (20,6))
  plt.plot(frequencies, rank, 'bo')
  print('\n')
  plt.title(f"Using zipf's law Log-log plot of word frequencies and rank for the year {year}", size = 16)
  plt.xlabel('Word Frequency', size = 14)
  plt.ylabel('Word Rank', size = 14)
  plt.show()
  print('\n')

















In [214]:
#5. Method-1  Create bigram network graphs for each year
import nltk
from nltk import bigrams
import itertools
import collections
# Assuming your dataframe is called df and it has a column called 'text'
from nltk.tokenize import RegexpTokenizer
def tokenize_text(text):
    tokenizer = RegexpTokenizer(r'\w+')
    tokens = tokenizer.tokenize(text)
    # tokens = nltk.word_tokenize(text)
    return tokens

for year in years:
  df_tweets_data_year = df_tweets_data[df_tweets_data['year']==2017]
  df_tweets_data_year['tokens'] = df_tweets_data_year['tweet_without_stopwords'].apply(tokenize_text)
  df_tweets_data_year['tokens']
  bigrams_list = [list(nltk.bigrams(tokens)) for tokens in df_tweets_data_year['tokens']]
  # Flatten list of bigrams in clean tweets
  bigrams = list(itertools.chain(*bigrams_list))
  # Create counter of words in clean bigrams
  bigram_counts = collections.Counter(bigrams)
  bigram_counts.most_common(50)
  bigram_df = pd.DataFrame(bigram_counts.most_common(50),
                              columns=['bigram', 'count'])
  
  # Create dictionary of bigrams and their counts
  d = bigram_df.set_index('bigram').T.to_dict('records')
  print(f"Bigram network graphs for each year {year}")
  print(bigram_df.head(15))
  # Create network plot 
  G = nx.Graph()

  # Create connections between nodes
  for k, v in d[0].items():
      G.add_edge(k[0], k[1], weight=(v * 10))

  G.add_node("tesla", weight=100)
  fig, ax = plt.subplots(figsize=(20, 10))

  pos = nx.spring_layout(G, k=2)

  # Plot networks
  nx.draw_networkx(G, pos,
                  font_size=16,
                  width=3,
                  edge_color='grey',
                  node_color='purple',
                  with_labels = False,
                  ax=ax)

  # Create offset labels
  for key, value in pos.items():
      x, y = value[0]+.135, value[1]+.045
      ax.text(x, y,
              s=key,
              bbox=dict(facecolor='red', alpha=0.25),
              horizontalalignment='center', fontsize=13)
  # nx.draw(G, pos=pos, with_labels=True, node_size=500, font_size=12, width=[bigram_df['count']*0.1 for (u,v,d) in G.edges(data=True)])
  # nx.draw_networkx_edge_labels(G, pos=pos, font_size=8)
  plt.show()
  print('\n')
Bigram network graphs for each year 2017
               bigram  count
0              (i, m)     60
1      (coming, soon)     45
2     (falcon, heavy)     45
3          (model, s)     45
4            (can, t)     42
5   (boring, company)     39
6         (falcon, 9)     36
7            (don, t)     36
8             (i, ve)     33
9        (next, year)     30
10      (good, point)     27
11          (you, re)     24
12         (model, x)     24
13       (next, week)     24
14      (next, month)     21

Bigram network graphs for each year 2018
               bigram  count
0              (i, m)     60
1      (coming, soon)     45
2     (falcon, heavy)     45
3          (model, s)     45
4            (can, t)     42
5   (boring, company)     39
6         (falcon, 9)     36
7            (don, t)     36
8             (i, ve)     33
9        (next, year)     30
10      (good, point)     27
11          (you, re)     24
12         (model, x)     24
13       (next, week)     24
14      (next, month)     21

Bigram network graphs for each year 2019
               bigram  count
0              (i, m)     60
1      (coming, soon)     45
2     (falcon, heavy)     45
3          (model, s)     45
4            (can, t)     42
5   (boring, company)     39
6         (falcon, 9)     36
7            (don, t)     36
8             (i, ve)     33
9        (next, year)     30
10      (good, point)     27
11          (you, re)     24
12         (model, x)     24
13       (next, week)     24
14      (next, month)     21

Bigram network graphs for each year 2020
               bigram  count
0              (i, m)     60
1      (coming, soon)     45
2     (falcon, heavy)     45
3          (model, s)     45
4            (can, t)     42
5   (boring, company)     39
6         (falcon, 9)     36
7            (don, t)     36
8             (i, ve)     33
9        (next, year)     30
10      (good, point)     27
11          (you, re)     24
12         (model, x)     24
13       (next, week)     24
14      (next, month)     21

Bigram network graphs for each year 2021
               bigram  count
0              (i, m)     60
1      (coming, soon)     45
2     (falcon, heavy)     45
3          (model, s)     45
4            (can, t)     42
5   (boring, company)     39
6         (falcon, 9)     36
7            (don, t)     36
8             (i, ve)     33
9        (next, year)     30
10      (good, point)     27
11          (you, re)     24
12         (model, x)     24
13       (next, week)     24
14      (next, month)     21

Bigram network graphs for each year 2022
               bigram  count
0              (i, m)     60
1      (coming, soon)     45
2     (falcon, heavy)     45
3          (model, s)     45
4            (can, t)     42
5   (boring, company)     39
6         (falcon, 9)     36
7            (don, t)     36
8             (i, ve)     33
9        (next, year)     30
10      (good, point)     27
11          (you, re)     24
12         (model, x)     24
13       (next, week)     24
14      (next, month)     21

In [144]:
#5. Method-2  Create bigram network graphs for each year
#In order to reduce the execution time I have taken only 200 elements for the plots
for year in years:
  df_tweets_data_year = df_tweets_data[df_tweets_data['year']==year]
  # Calculating the frequency of each word
  res = df_tweets_data_year['tweet_without_stopwords'].str.split(expand=True).stack().value_counts()
  df_word_freq_year = res.to_frame().reset_index()
  df_word_freq_year = df_word_freq_year.rename(columns= {0: 'count'})
  df_word_freq_year = df_word_freq_year.rename(columns= {'index': 'words'})
  bi_words = list(nltk.bigrams(df_word_freq_year.words))
  bi_analysis = nltk.FreqDist(bi_words[:200])
  print(f"Bigram network graphs for each year {year}")
  print('\n')
  Net_Graph = nx.Graph()
  for index, row in bi_analysis.most_common():
    Net_Graph.add_weighted_edges_from([(index[0], index[1], row)])  
  plt.figure(figsize=(20,10))
  options = {
      'edge_color': '#987654',
      'width': 1.5,
      'with_labels': True,
      'font_weight': 'bold',
  }

  nx.draw(Net_Graph, pos=nx.spring_layout(Net_Graph, k=0.25, iterations=10), **options)
  axes = plt.gca()
  axes.collections[0].set_edgecolor("#987654") 
  plt.show()
  print('\n')
Bigram network graphs for each year 2017



Bigram network graphs for each year 2018



Bigram network graphs for each year 2019



Bigram network graphs for each year 2020



Bigram network graphs for each year 2021



Bigram network graphs for each year 2022



In [220]:
#Additional Analysis of tweets 
# Drop rows that has NaN values on selected columns
df_cleaned_likes=df_tweets_data.dropna(subset=['nlikes','nreplies','nretweets'])
df_cleaned_likes.drop(columns=['language','username','tweet_without_stopwords'],inplace=True)
df_cleaned_likes.drop_duplicates(subset=['tweet'],inplace=True)

#Plotting the bar garph to visualize the top liked tweets based on year 2017
df_cleaned_likes_1 = df_cleaned_likes[df_cleaned_likes['year']==2017]
sorted_df = df_cleaned_likes_1.sort_values(by=['nlikes'],ascending=False)
top_20_df = sorted_df.iloc[:20]
print('\n')
fig = px.bar(top_20_df, x='tweet', y='nlikes',title="Top likes for tweets in each year 2017")
fig.update_layout(
    xaxis = {
     'tickmode': 'array',
     'tickvals': list(range(len(top_20_df))),
     'ticktext': top_20_df['tweet'].str.slice(-30).tolist(),
    }
)
fig.show()

In [147]:
# Drop rows that has NaN values on selected columns
df_cleaned_likes=df_tweets_data.dropna(subset=['nlikes','nreplies','nretweets'])
df_cleaned_likes.drop(columns=['language','username','tweet_without_stopwords'],inplace=True)
df_cleaned_likes.drop_duplicates(subset=['tweet'],inplace=True)
#Plotting the bar garph to visualize the top liked tweets based on year 2018
df_cleaned_likes_2 = df_cleaned_likes[df_cleaned_likes['year']==2018]
sorted_df = df_cleaned_likes_2.sort_values(by=['nlikes'],ascending=False)
top_20_df = sorted_df.iloc[:20]
print('\n')
fig = px.bar(top_20_df, x='tweet', y='nlikes',title="Top likes for tweets in each year 2018")
fig.update_layout(
    xaxis = {
     'tickmode': 'array',
     'tickvals': list(range(len(top_20_df))),
     'ticktext': top_20_df['tweet'].str.slice(-30).tolist(),
    }
)
fig.show()

In [148]:
# Drop rows that has NaN values on selected columns
df_cleaned_likes=df_tweets_data.dropna(subset=['nlikes','nreplies','nretweets'])
df_cleaned_likes.drop(columns=['language','username','tweet_without_stopwords'],inplace=True)
df_cleaned_likes.drop_duplicates(subset=['tweet'],inplace=True)
#Plotting the bar garph to visualize the top liked tweets based on year 2019
df_cleaned_likes_3 = df_cleaned_likes[df_cleaned_likes['year']==2019]
sorted_df = df_cleaned_likes_3.sort_values(by=['nlikes'],ascending=False)
top_20_df = sorted_df.iloc[:20]
print('\n')
fig = px.bar(top_20_df, x='tweet', y='nlikes',title="Top likes for tweets in each year 2019")
fig.update_layout(
    xaxis = {
     'tickmode': 'array',
     'tickvals': list(range(len(top_20_df))),
     'ticktext': top_20_df['tweet'].str.slice(-30).tolist(),
    }
)
fig.show()

In [149]:
# Drop rows that has NaN values on selected columns
df_cleaned_likes=df_tweets_data.dropna(subset=['nlikes','nreplies','nretweets'])
df_cleaned_likes.drop(columns=['language','username','tweet_without_stopwords'],inplace=True)
df_cleaned_likes.drop_duplicates(subset=['tweet'],inplace=True)
#Plotting the bar garph to visualize the top liked tweets based on year 2020
df_cleaned_likes_4 = df_cleaned_likes[df_cleaned_likes['year']==2020]
sorted_df = df_cleaned_likes_4.sort_values(by=['nlikes'],ascending=False)
top_20_df = sorted_df.iloc[:20]
print('\n')
fig = px.bar(top_20_df, x='tweet', y='nlikes',title="Top likes for tweets in each year 2020")
fig.update_layout(
    xaxis = {
     'tickmode': 'array',
     'tickvals': list(range(len(top_20_df))),
     'ticktext': top_20_df['tweet'].str.slice(-30).tolist(),
    }
)
fig.show()

In [ ]:
!pwd
%cd /content
!pwd
!jupyter nbconvert --to html 'Project_3_group_16.ipynb'